
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
    <meta name="description" content="">
    <meta name="author" content="">
    <link rel="icon" href="../../favicon.ico">

    <title>Guide to the OWASP Benchmark v${version}</title>

    <!-- Bootstrap core CSS -->
    <link href="content/css/bootstrap.min.css" rel="stylesheet">

    <!-- Custom styles for this template -->
    <link href="content/dashboard.css" rel="stylesheet">

    <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
    <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
    <script src="content/js/ie-emulation-modes-warning.js"></script>

    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
  </head>

  <body>

    <nav class="navbar navbar-inverse navbar-fixed-top">
      <div class="container-fluid">
        <div class="navbar-header">
          <a class="navbar-brand" href="OWASP_Benchmark_Home.html">OWASP Benchmark v${version}</a>
        </div>
        <div id="navbar" class="navbar-collapse collapse">
          <ul class="nav navbar-nav navbar-right">
            <li><a href="OWASP_Benchmark_Home.html">Home</a></li>
			<li class="dropdown">
	          <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tools<span class="caret"></span></a>
	          <ul class="dropdown-menu">
${toolmenu}
	          </ul>
	        </li>
			<li class="dropdown">
	          <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Vulnerabilities<span class="caret"></span></a>
	          <ul class="dropdown-menu">
${vulnmenu}
 	          </ul>
	        </li>
            <li><a href="OWASP_Benchmark_Guide.html">Guide</a></li>
          </ul>
        </div>
      </div>
    </nav>


    <div class="container">

      <div class="starter-template">

<div>empty</div>
<div>empty</div>

<h2>Introduction</h2>
<p>The OWASP Benchmark is a test suite designed to evaluate the speed, coverage, and accuracy of automated vulnerability detection tools. Without the ability to measure these tools, 
it is difficult to understand their strengths and weaknesses, and compare them to each other. The Benchmark contains thousands of test cases that are fully runnable and exploitable.</p>

<p>You can currently use the Benchmark with Static Application Security Testing (SAST) tools. A future goal is to support the evaluation of Dynamic Application Security Testing (DAST) tools like OWASP ZAP and Interactive Application Security Testing (IAST). The current version is implemented in Java. Future versions may expand to include other languages.</p>

<p>For more information, please visit the <a href="https://www.owasp.org/index.php/Benchmark">OWASP Benchmark Project Site</a>.

<h2>Interpretation Guide</h2>
<p>Security tools (SAST, DAST, and IAST) are amazing when they find a complex vulnerability in your code. But they can drive everyone crazy with complexity, false alarms, and missed vulnerabilities. Using these tools without understanding their strengths and weaknesses can lead to a dangerous false sense of security.</p>

<p>We are on a quest to measure just how good these tools are at discovering and properly diagnosing security problems in applications. We rely on the long history of military and medical evaluation of detection technology as a foundation for our research. Therefore, the test suite tests both real and fake vulnerabilities.</p>

<p>There are four possible test outcomes in the Benchmark:</p>
<ul>
<li>Tool correctly identifies a real vulnerability (True Positive - TP)</li>
<li>Tool fails to identify a real vulnerability (False Negative - FN)</li>
<li>Tool correctly ignores a false alarm (True Negative - TN)</li>
<li>Tool fails to ignore a false alarm (False Positive - FP)</li>
</ul>

<p>We can learn a lot about a tool from these four metrics. A tool that simply flags every line of code as vulnerable will perfectly identify all vulnerabilities in an application, but will also have 100% false positives. Similarly, a tool that reports nothing will have zero false positives, but will also identify zero real vulnerabilities. Imagine a tool that flips a coin to decide whether to report each vulnerability for every test case. The result would be 50% true positives and 50% false positives. We need a way to distinguish valuable security tools from these trivial ones.</p>

<p>If you imagine the line that connects all these points, from 0,0 to 100,100 establishes a line that roughly translates to "random guessing." The ultimate measure of a security tool is how much better it can do than this line. The diagram below shows how we will evaluate security tools against the Benchmark.</p>

<img src="content/benchmark_guide.png"/>

<h3>Key:</h3>

<table class="table">
<tr>
<th>True Positive (TP)</th>
<td>Tests with real vulnerabilities that were correctly reported as vulnerable by the tool.</td>
</tr>

<tr>
<th>False Negative (FN)</th>
<td>Tests with real vulnerabilities that were not correctly reported as vulnerable by the tool.</td>
</tr>

<tr>
<th>True Negative (TN)</th>
<td>Tests with fake vulnerabilities that were correctly not reported as vulnerable by the tool.</td>
</tr>

<tr>
<th>False Positive (FP)</th>
<td>Tests with fake vulnerabilities that were incorrectly reported as vulnerable by the tool.</td>
</tr>

<tr>
<th>True Positive Rate (TPR) = TP / ( TP + FN )</th>
<td>The rate at which the tool correctly reports real vulnerabilities. Also referred to as Recall, as defined at 
<a href="https://en.wikipedia.org/wiki/Precision_and_recall">Wikipedia</a>.</td>
</tr>

<tr>
<th>False Positive Rate (FPR) = FP / ( FP + TN )</th>
<td>The rate at which the tool incorrectly reports fake vulnerabilities as real.</td>
</tr>

<tr>
<th>Score = TPR - FPR</th>
<td>Normalized distance from the random guess line.</td>
</tr>
</table>

	</div>
	
    </div>

	<!-- Bootstrap core JavaScript
    ================================================== -->
    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <!-- Include all compiled plugins (below), or include individual files as needed -->
    <script src="content/js/bootstrap.min.js"></script>
  </body>
</html>
